| station | valid | lat | lon | elevation | tmpf | dwpf | relh | drct | sknt | p01i | alti | mslp | vsby | gust | skyc1 | skyc2 | skyc3 | skyc4 | skyl1 | skyl2 | skyl3 | skyl4 | wxcodes | ice_accretion_1hr | ice_accretion_3hr | ice_accretion_6hr | peak_wind_gust | peak_wind_drct | peak_wind_time | feel | snowdepth |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| cat | datetime[μs] | f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 | cat | cat | cat | cat | f32 | f32 | f32 | f32 | str | f32 | f32 | f32 | f32 | f32 | datetime[μs] | f32 | f32 |
| "HQG" | 2010-09-16 18:00:00 | 37.163101 | -101.370499 | 956.52002 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
| "LBL" | 2019-09-15 12:00:00 | 37.044201 | -100.9599 | 879.0 | 64.5 | 57.666668 | 78.71167 | 198.333328 | 7.666667 | 0.0 | 30.139999 | 1017.383362 | 10.0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | 64.48333 | null |
| "GCK" | 2016-03-10 18:00:00 | 37.927502 | -100.724403 | 881.0 | 60.333332 | 29.833334 | 34.814999 | 152.0 | 7.166667 | 0.0 | 30.16 | 1020.68335 | 10.0 | 16.0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | 60.32 | null |
| "GCK" | 2019-09-19 00:00:00 | 37.927502 | -100.724403 | 881.0 | 86.0 | 52.5 | 32.200001 | 186.0 | 4.166667 | 0.0 | 29.926666 | 1009.833313 | 10.0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | 84.199997 | null |
| "19S" | 2016-08-27 00:00:00 | 37.496899 | -100.832901 | 892.570007 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
Predictive Modeling of Weather Station Data:
Linear Regression vs. Graph Neural Network
Slides: slides.html
Introduction
Accurate weather prediction is a crucial task with widespread implications across agriculture, transportation, disaster preparedness, and energy management. Traditional forecasting methods often rely on statistical models or physics-based simulations, however, with the advancement of graphical neural networks (GNN) we believe there is potential in a more modern deep learning approach [Lam et al. (2023)](Keisler 2022).
In this project, we explore the predictive power of a traditional linear regression model and a GNN on real-world weather station data (Herzmann 2023). Our aim is to evaluate whether the GNN’s ability to incorporate spatial relationships between stations offers a measurable advantage over more conventional techniques.
The dataset consists of multiple weather stations located within the same geographic region. Each station collects meteorological variables over time, and can be represented as a node within a broader spatial network. For the linear model baseline, a single model will be trained using all stations’ data aggregated per feature for each time step.
For the GNN the model will be trained on the entire network of stations, where each node corresponds to a station and edges represent spatial relationships. The graph is encoded via a dense adjacency matrix, excluding self-connections. The GNN aims to leverage the inherent spatial structure of the data, potentially capturing regional weather patterns and inter-station dependencies that are invisible to traditional models (Li et al. 2018).
Our evaluation focuses on forecasting performance over the last 6-months of the dataset. We asses how well each modelling approach predicts key weather variables and investigate the conditions under which one model may outperform the other.
Methods
1. Cleaning Process
The raw weather station data underwent a multistage cleaning and preprocessing procedures designed to ensure temporal consistency, handle missing values, and prepare the data for both linear and GNN-based models. The steps are as follows:
Temporal Alignment: Missing time steps were filled for all stations to ensure uniform time coverage across the selected date range.
Temporal Compression: Data was downsampled from 1-hour intervals to 6-hour intervals to reduce noise and improve model efficiency.
Feature Pruning: Features missing more than 10% of their values in over 6 our of 8 stations were removed.
Station Filtering: Stations missing more than two years of valid data were excluded to ensure consistency across nodes (some stations were newer than others).
Correlation Analysis: Performed both between-station and within-station correlation analysis to better understand the spatial and temporal dependencies among features.
Feature Scaling: Remaining features were scaled using appropriate normalization techniques (e.g., min-max, robust scaling) to facilitate model training.
Data Reshaping: The final dataset was transformed into a 3D array off shape (time, station, feature), serving as the unified input format for both the linear and GNN models.
2. Linear Model
The linear baseline model was designed as a univariate time-series regression task. It uses the preceding 28 time steps (equivalent to 7 days) to predict the target variable tmpf at the next time step.
Model Input:
At each time step t, the input vector aggregates the five base features across all stations:
\(\text{tmpf}_{t+1}=\text{features}_t+\text{features}_{t-1}+...+\text{features}_{t-27}\)
Where:
\(\text{features}_t=\text{\{tmpf, }\text{relh, }\text{sknt, }\text{drct}_{sin}\text{, drct}_{cos}\text{\}}\)
- tmpf: Temperature
- relh: Relative Humidity
- sknt: Wind Speed
- \(\text{drct}_{sin}\text{, drct}_{cos}\text{: }\)Wind Direction encoded as sine and cosine components
This flattened representation allows a standard linear regression model to be trained on a fixed-length vector for each prediction target and simplifies the graph structure of the weather stations.
3. GNN
The GNN is structured to model the spatiotemporal dynamics of the weather station network. It is implemented through PyTorch, the architecture is inspired by the Diffusion Convolutional Recurrent Neural Network (DCRNN) (Li et al. 2018).
Graph Structure and Input Format
Graph Representation: Each weather station is represented as a node, and spatial relationships are encoded in a dense adjacency matrix (excluding self-loops)
Data Format: The input is formatted using the StaticGraphTemporalSignal class from torch-geometric-temporal, which supports the temporal sequences on a static graph.
Train/Validation/Test Split:
Testing: Final 6 months of data (730 time steps at 6-hour resolution)
Training: First 80% of the remaining data
Validation: Final 20% of the remaining data
Model Architecture
The model consists of three stacked DCRNN layers followed by a linear projection layer:
DCRN(140, 64) -> ReLU -> DCRN(64, 32) -> ReLU -> DCRN(32, 32) -> ReLU -> Linear(32, 1)
DCRNN Layers: Capture both temporal patterns as well as spatial diffusion through the graph structure.
ReLU Activations: Introduce nonlinearity after each recurrent layer.
Linear Output: Maps the final hidden state to the predicted temperature at the next time step
Training Configuration:
Optimizer: Adam
Learning Rate: Initial rate of 0.01, reduced by a factor of 0.1 upon plateau, with a minimum of 10^-5
Epochs: Trained for up to 100 epochs with early stopping based on validation loss plateau
The GNN, like the linear model, uses the preceding 28 time steps to forecast the next temperature value. However, it is equipped to model both temporal trends and spatial interdependencies, which the linear model is not explicitly designed to capture.
Analysis and Results
Data Exploration and Visualization
1. Data Source:
The dataset used in this project was sourced from the Iowa Environmental Mesonet (IEM) hosted by the Iowa State University (Herzmann 2023). The data follows observational standards set by the FAA Automated Surface Observing System (ASOS) (Administration 2021). For this project, we selected a subset covering a 10-year period from 2010 to 2020, focusing on nine weather stations located in south eastern Kansas. These nodes were chosen based on geographic proximity and consistency of data availability.
2. Data Structure
The original dataset contains:
33 features
8 stations
96,408 hourly time steps
Intermittent missing values across both time and stations
A subset of key features relevant to this project is summarized below:
| Feature | Description |
|---|---|
| station | Station identifier code (3-4 characters) |
| valid | Timestamp of the observation |
| lon | Longitude |
| lat | Latitude |
| elevation | Elevation in feet |
| tmpf | Air temperature (F) |
| relh | Relative humidity (%) |
| drct | Wind direction (degrees) |
| sknt | Wind speed (knots) |
| p01i | Precipitations (inches) over the previous hour |
| vsby | visibility (miles) |
3. Exploratory Data Analysis (EDA)
Initial exploratory analysis focused on filtering out low-quality data and reducing the dimensionality of the dataset:
Features with more than 10% missing values were removed.
Stations with excessive missing data during the 2010-2020 window were dropped. One station was excluded entirely as it was introduced after 2020, and thus had no data within the selected range.
The remaining dataset was evaluated to ensure sufficient temporal coverage and consistency across stations and features.
With this visual a date range was selected foe 2018 to 2020 as this range had the most valid features and stations while also being quite recent. As this range was selected the ULS station was dropped resulting in seven valid stations.
| station | valid | lat | lon | elevation | tmpf | dwpf | relh | sknt | feel | drct_sin | drct_cos |
|---|---|---|---|---|---|---|---|---|---|---|---|
| cat | datetime[μs] | f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 | f64 | f64 |
| "GCK" | 2018-01-01 00:00:00 | 37.927502 | -100.724403 | 881.0 | 9.283334 | -6.5 | 48.148335 | 8.666667 | -4.161667 | 0.851117 | 0.524977 |
| "LBL" | 2018-01-01 00:00:00 | 37.044201 | -100.9599 | 879.0 | 12.316667 | -1.983333 | 52.014999 | 10.166667 | -1.721667 | 0.664796 | 0.747025 |
| "EHA" | 2018-01-01 00:00:00 | 37.000801 | -101.879997 | 1099.0 | 15.555555 | 5.255556 | 63.382778 | 7.388889 | 4.482222 | 0.970763 | 0.24004 |
| "HQG" | 2018-01-01 00:00:00 | 37.163101 | -101.370499 | 956.52002 | 14.311111 | -1.605556 | 48.468334 | 7.777778 | 2.681667 | 0.936332 | 0.351115 |
| "3K3" | 2018-01-01 00:00:00 | 37.991699 | -101.7463 | 1005.700012 | 13.1 | -0.9 | 53.127777 | 6.777778 | 2.151111 | 0.981255 | 0.192712 |
| … | … | … | … | … | … | … | … | … | … | … | … |
| "EHA" | 2020-12-31 00:00:00 | 37.000801 | -101.879997 | 1099.0 | 42.355556 | 16.588888 | 34.955555 | 3.0 | 40.254444 | -0.533205 | -0.845986 |
| "HQG" | 2020-12-31 00:00:00 | 37.163101 | -101.370499 | 956.52002 | 40.722221 | 13.255555 | 32.23111 | 1.666667 | 39.450001 | 0.977334 | -0.211704 |
| "3K3" | 2020-12-31 00:00:00 | 37.991699 | -101.7463 | 1005.700012 | 40.200001 | 12.2 | 31.377777 | 4.555555 | 36.683334 | -0.700217 | -0.71393 |
| "JHN" | 2020-12-31 00:00:00 | 37.578201 | -101.7304 | 1012.710022 | 40.711113 | 17.4 | 38.554443 | 4.777778 | 37.06889 | -0.824675 | -0.565607 |
| "19S" | 2020-12-31 00:00:00 | 37.496899 | -100.832901 | 892.570007 | 39.922222 | 15.388889 | 36.552223 | 6.111111 | 35.113335 | -0.824675 | 0.565607 |
4. Graph Creation
To prepare the dataset for graph-based modeling, a spatial graph was constructed:
Each station was treated as a node.
A dense adjacency matrix (excluding self-connections) was created by computing geodesic distances between stations.
Edge weights wwere defined as the inverse of the geodesic distance, scaled to a [0, 1] range using MinMax scaler. The closer two stations are, the stronger their connection in the graph.
5. Spatiotemporal Imputation
Missing values were imputed through a two-stage process leveraging both spatial and temporal structure:
Spatial Imputation: Each missing value was estimated based on the value of neighboring nodes within the same time step, weighted by graph connectivity.
Temporal Imputation: Remaining gaps were filled by interpolating along the time axis for each node individually.
While not a perfect method, this approach produced plausible and continuous data, as visually confirmed during quality checks and shown below.
6. Correlation Analysis
To avoid feature redundancy and data leakage, a correlation analysis was conducted:
Inter-node and Intra-node correlations were computed.
Two features, dwpf (dew point in f) and feel (feels-like temperature in f), showed high correlation with the target variable tmpf (temperature in f). Since these variables are partially derived from the target variable tmpf, they were removed to maintain model integrity and avoid leakage.
7. Final Preparation
After all preprocessing steps, the final dataset was reduced and standardized:
5 features were retained: tmpf, relh, sknt, drct_sin, drct_cos
7 stations remained after filtering
4,381 time steps at 6-hour intervals (equivalent to 2 years of data)
The remaining unscaled features were scaled using the RobusScaler from scikit-learn to mitigate the influence of outliers while preserving overall data distribution.
shape: (30_667, 6)
┌─────────┬───────────┬──────────┬───────────┬───────────┬───────────┐
│ station ┆ tmpf ┆ relh ┆ sknt ┆ drct_sin ┆ drct_cos │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ cat ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════╪═══════════╪══════════╪═══════════╪═══════════╪═══════════╡
│ GCK ┆ -1.355721 ┆ 0.481483 ┆ -0.080357 ┆ 0.851117 ┆ 0.524977 │
│ LBL ┆ -1.265174 ┆ 0.52015 ┆ 0.160714 ┆ 0.664796 ┆ 0.747025 │
│ EHA ┆ -1.168491 ┆ 0.633828 ┆ -0.285714 ┆ 0.970763 ┆ 0.24004 │
│ HQG ┆ -1.205638 ┆ 0.484683 ┆ -0.223214 ┆ 0.936332 ┆ 0.351115 │
│ 3K3 ┆ -1.241791 ┆ 0.531278 ┆ -0.383929 ┆ 0.981255 ┆ 0.192712 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ EHA ┆ -0.368491 ┆ 0.349556 ┆ -0.991072 ┆ -0.533205 ┆ -0.845986 │
│ HQG ┆ -0.417247 ┆ 0.322311 ┆ -1.205357 ┆ 0.977334 ┆ -0.211704 │
│ 3K3 ┆ -0.432836 ┆ 0.313778 ┆ -0.741072 ┆ -0.700217 ┆ -0.71393 │
│ JHN ┆ -0.417579 ┆ 0.385544 ┆ -0.705357 ┆ -0.824675 ┆ -0.565607 │
│ 19S ┆ -0.441128 ┆ 0.365522 ┆ -0.491072 ┆ -0.824675 ┆ 0.565607 │
└─────────┴───────────┴──────────┴───────────┴───────────┴───────────┘
Modeling and Results
Model 1: Graph Neural Network (DCRNN)
The Graph Neural Network was trained using the previous 28 time steps (equivalent to 7 days) and leveraged a dense spatial graph connecting all stations. This structure enabled the model to learn boht temporal sequences and spatial diffusion patters across the weather station network.
Graph Structure: Dense graph with edge weights based on inverse geodesic distance
Architecture: Three stacked DCRNN layers + ReLU activations + Linear projection
Target: Next-step temperature prediction for each node
Results:
MSE: 0.0562
It is visually apparent that the model is able to follow the temporal trends of the weather data with little latency in predictive results.
However, it is also apparent that there is a weird latency in the results of the prediction as an absolute error of 1 is quite extreme given the total range of the test features is -1.5 to 1.5.
Model 2: Linear Regression
The linear baseline model was trained using the same 28-time-step history with aggregated weather station data to predict the next time step’s temperature. All features were flattened into a single vector, treating the problem as a high-dimensional regression task with no spatial awareness.
Strengths: Simplicity, interpretability, low training time
Weakness: Cannot leverage inter-station relationships or dynamic spatial trends.
MSE: 0.0147
It is visually apparent that the model performs quite a bit better than the GNN as there is little latency in the predicted results.
It is also clear that the maximum absolute error is 0.8 which is quite a bit less than the 1 for the GNN model.
When both models are compared against eachother per station it becomes apparent how poorly the GNN is performing on these predictive tasks.
Key Findings and Interpretation
Small Graph structures do not matter: The GNN severely underperformed the linear model in both MSE and absolute error across all stations, highlighting the power of a standard linear model on a small scale weather system.
Temporal Context is Crucial: Both models benefitted from the use of 28 historical time steps, suggesting that short-term temporal trends are strong predictors of near-future temperature.
Feature Engineering Adds Value: Replacing wind direction with sin/cosine components improved learning stability and reduced directional ambiguity.
Graph Structure is Important: The small scale of the graph most likely played a significant role in the underperformance of the GNN model.
Static Graphs are Restrictive: Not applying and encoding and decoding layer to the GNN most likely limited information transfer between nodes, however, this may not have been important for such a small system.
Conclusion
Summary of Key Results
A Graph Neural Network trained on a spatiotemporal weather graph may not outperform a standard linear regression baseline for short-term temperature prediction.
Incorporating spatial structure through graph edges enabled the model to learn regional weather interactions that linear models could not.
Careful data preprocessing, including imputation’s scaling, and circular feature handling, was essential to achieving strong performance from both models.
GNNs are incredibly sensitive to parameter tuning and may outperform if provided a much larger model structure.
Implications and Future Work
These findings demonstrate the potential of traditional models compared to graph-based deep learning approaches, however, it is also apparent that the reliance on aggregating stations as is done with the linear model most likely only worked due to the close proximity of the stations. If the dataset was instead made from all of the stations across the US I doubt it would be possible to aggregate in a way that still preserves spatial information. As this is the case we still believe there is potential in the application of a graph-based deep learning approach when it comes to large scale weather forecasting.
Future improvements could include:
Proper multivariate analysis
Large geographic area
Application of an encoding-decoding step
More advanced GNN variants
Exploring other imputation techniques